Skip to content
This repository has been archived by the owner on Jul 30, 2021. It is now read-only.

checkpointer: implement state machine. #755

Merged
merged 3 commits into from
Nov 1, 2017

Conversation

diegs
Copy link
Contributor

@diegs diegs commented Oct 28, 2017

This implements an explicit state machine to capture the various states
a checkpoint can be in. It recovers those states from disk upon startup
(if applicable) and then reconciles them with the states that are
fetched from the various apiservers.

It also adds 2 new states to attempt to workaround issues with
kubernetes 1.8 in which daemonset pods are deleted before being
recreated. For single-master clusters this can lead to a total outage
during apiserver upgrades since the checkpointer will aggressively
retire the checkpoint for the deleted apiserver pod before it has
a chance to schedule the new one.

@diegs diegs added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Oct 28, 2017
@diegs diegs self-assigned this Oct 28, 2017
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 28, 2017
@diegs
Copy link
Contributor Author

diegs commented Oct 31, 2017

coreosbot run e2e checkpointer

@diegs diegs changed the title WIP checkpointer: implement state machine. checkpointer: implement state machine. Oct 31, 2017
@diegs diegs removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Oct 31, 2017
@diegs diegs requested a review from aaronlevy October 31, 2017 04:37
@diegs
Copy link
Contributor Author

diegs commented Oct 31, 2017

cc @thorfour

@diegs
Copy link
Contributor Author

diegs commented Oct 31, 2017

coreosbot run e2e checkpointer

@diegs diegs requested a review from dghubble October 31, 2017 05:26
Copy link

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diegs This looks promising. Thanks for the update!

Would you like us to test this on coreos/tectonic-installer#1938 before merging here?

@diegs
Copy link
Contributor Author

diegs commented Oct 31, 2017

@mxinden i am going to test this against a tectonic cluster this morning.

Copy link
Contributor

@aaronlevy aaronlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple minor comments, and somewhat quick pass (probably good to have another pair of eyes take a look at)

state: stateNone{},
}
}
// Always overwrite with the localParentPod since it is a better source of truth.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better source of truth than what (and for what)? Do we not just use name/namespace (I wouldn't expect that to change).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's using this pod to write to disk so we want to make sure we have the most updated copy of it right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. Duh.

if !ok || isPodCheckpointer(inactiveCheckpoints[id], checkpointerPod) {
glog.V(4).Infof("Should start checkpoint %s", id)
start = append(start, id)
// The apiserver pod is deleted, transition to stateInactiveGracePeriod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the apiserver parent pod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for k := range removeMap {
if k == podCheckpointerID {
continue
// The apiserver pod is still deleted, remain in stateInactiveGracePeriod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apiserver parent pod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@diegs diegs force-pushed the cp-cooldown branch 3 times, most recently from b9d2b16 to 4544738 Compare October 31, 2017 20:11
@diegs
Copy link
Contributor Author

diegs commented Oct 31, 2017

coreosbot run e2e checkpointer

@diegs
Copy link
Contributor Author

diegs commented Oct 31, 2017

@aaronlevy thanks for the comments. Made a few small changes, also testing it against tectonic and it seems to be surviving apiserver destruction nicely

@diegs
Copy link
Contributor Author

diegs commented Oct 31, 2017

coreosbot run e2e checkpointer

@@ -73,3 +69,11 @@ func writeManifestIfDifferent(path, name string, data []byte) (bool, error) {
glog.Infof("Writing manifest for %q to %q", name, path)
return true, writeAndAtomicRename(path, data, 0644)
}

func writeAndAtomicRename(path string, data []byte, perm os.FileMode) error {
tmpfile := filepath.Join(filepath.Dir(path), "."+filepath.Base(path))
Copy link

@yifan-gu yifan-gu Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I notice that maybe we should not write the file under the directory.
Even with the prefix of ".", it will still be scanned by ReadDir() in func getFileCheckpoints().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yifan-gu since the checkpointer is single-threaded getFileCheckpoints() should not run concurrently. The only problem is if the program crashes in the middle of writeAndAtomicRename and we have a leftover .foo.json file sitting there.

I can adjust getFileCheckpoints() to ignore files that start with ., WDYT?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question is why we are not just using a system temp dir here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yifan-gu Atomic renames are only guaranteed in the same directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yifan-gu added some code in getFileCheckpoints() to clean up old files that start with .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://godoc.org/os#Rename calls rename() says 'OS-specific restrictions may apply when oldpath and newpath are in different directories. If there is an error, it will be of type *LinkError.'

If you look at http://man7.org/linux/man-pages/man2/rename.2.html, EXDEV occurs if the old and new locations are on different filesystems (I've seen this happen before). It's safer to keep it in the same directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ioutil.TempFile would be a smart choice to create a temporary file within a certain directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rphillips the only difference would be that it will choose a random rather than deterministic filename, as we would be calling it with the current directory and . prefix, and then writing to it... I can make the change anyway though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it isn't a hard blocker... just a suggestion if we need to enumerate those files with a better prefix in the future...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh already changed it.

t.Errorf("For test: %s\nExpected start: %s Got: %s\nExpected stop: %s Got: %s\nExpected remove: %s Got: %s\n",
tc.desc, tc.expectStart, gotStart, tc.expectStop, gotStop, tc.expectRemove, gotRemove)
!reflect.DeepEqual(tc.expectRemove, gotRemove) ||
!reflect.DeepEqual(tc.expectGraceStart, gotGraceStart) ||

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe just print them one by one? I mean:

if !reflect.DeepEqual() {
    t.Errorf()
}
if !reflect.DeepEqual() {
    t.Errorf()
}
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple states this can get a little verbose. I prefer to keep it as is.

return starts, stops, removes
}

// stateSelfCheckpoint represents a checkpoint of the checkpointer itself, which has special

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe define all these states in a separate file? this process.go is getting big, I think it would be cleaner to just keep control logics here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// state represents the current state of a checkpoint.
type state interface {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was a little confused by the name at first because we also have apiState, either change apiState to something else like apiCondition, or rename this to podState?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (both)

@diegs
Copy link
Contributor Author

diegs commented Oct 31, 2017

@yifan-gu thanks for the review! PTAL

@diegs
Copy link
Contributor Author

diegs commented Oct 31, 2017

coreosbot run e2e checkpointer

s-urbaniak pushed a commit to s-urbaniak/tectonic-installer that referenced this pull request Nov 1, 2017
@s-urbaniak
Copy link

Thanks a lot @diegs, first time coreos/tectonic-installer#1938 passed green with this one 👍

s-urbaniak pushed a commit to s-urbaniak/tectonic-installer that referenced this pull request Nov 1, 2017
s-urbaniak pushed a commit to s-urbaniak/tectonic-installer that referenced this pull request Nov 1, 2017
// behavior.
//
// stateSelfCheckpoint can transition to stateInactiveGracePeriod.
type stateSelfCheckpoint struct{}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: stateSelfCheckpointActive ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

@yifan-gu yifan-gu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction, overall LGTM.
I hope to see some more unit tests that exercises all the transition path between those states in the future, not a blocker.

This implements an explicit state machine to capture the various states
a checkpoint can be in. It recovers those states from disk upon startup
(if applicable) and then reconciles them with the states that are
fetched from the various apiservers.

It also adds 2 new states to attempt to work around issues with
kubernetes 1.8 in which daemonset pods are deleted before being
recreated. For single-master clusters this can lead to a total outage
during apiserver upgrades since the checkpointer will aggressively
retire the checkpoint for the deleted apiserver pod before it has
a chance to schedule the new one.
@diegs
Copy link
Contributor Author

diegs commented Nov 1, 2017

@yifan-gu thanks, I added a simple test to enforce only allowed transitions can occur. It doesn't actually check behavior (I was relying on the existing test for that) but I think it's a small improvement for now.

@diegs
Copy link
Contributor Author

diegs commented Nov 1, 2017

coreosbot run e2e checkpointer

Copy link

@yifan-gu yifan-gu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

If the checkpointer crashes during `writeAndAtomicRename()` then it
may leave temporary files that start with `.`. This checks for and
removes them.

Also changes `writeAndAtomicRename()` to use `ioutil.TempFile()`.
@diegs
Copy link
Contributor Author

diegs commented Nov 1, 2017

coreosbot run e2e checkpointer

@diegs
Copy link
Contributor Author

diegs commented Nov 1, 2017

coreosbot run e2e checkpointer

@diegs diegs merged commit 69f99ef into kubernetes-retired:master Nov 1, 2017
@diegs diegs deleted the cp-cooldown branch November 1, 2017 22:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants